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We present variability analysis of data from the Northern Sky Variability Sur- 
vey (NSVS). Using the clustering method which defines variable candidates as 
outliers from large clusters, we cluster 16,189,040 light curves, having data points 
at more than 15 epochs, as variable and non-variable candidates in 638 NSVS 
fields. Variable candidates are selected depending on how strongly they are sep- 
arated from the largest cluster and how rarely they are grouped together in eight 
dimensional space spanned by variability indices. All NSVS light curves are also 
cross-correlated to the Infrared Astronomical Satellite, AKARl, Two Micron All 
Sky Survey, Sloan Digital Sky Survey (SDSS), and Galaxy Evolution Explorer 
objects as well as known objects in the SIMBAD database. The variability anal- 
ysis and cross-correlation results are provided in a public online database which 
can be used to select interesting objects for further investigation. Adopting con- 
servative selection criteria for variable candidates, we find about 1.8 million light 
curves as possible variable candidates in the NSVS data, corresponding to about 
10% of our entire NSVS samples. Multi-wavelength colors help us find specific 
types of variability among the variable candidates. Moreover, we also use mor- 
phological classification from other surveys such as SDSS to suppress spurious 
cases caused by blending objects or extended sources due to the low angular 
resolution of the NSVS. 

Subject headings: astronomical databases: miscellaneous - methods: data anal- 
ysis - methods: statistical - stars: variables: general 



1. Introduction 

Emerging projects in time-domain astronomy produce a large amount of time-series 
data, and allow discoveries of new variable sources and better understanding of known y ari- 
ability types (see |Paczyhskill2000t iDjorgovski et al.ll200ll : iBono. Trevese. fc Turattdl2003l . for 
a review). This new era needs a computationally intensive processing of massive time-series 
data with compu tational al gorithms to recover interesting objects with a broad range of 
variability types (lEyerll2006l ). 



Investigating time variability of astronomical objects begins with detecting any signifi- 
cant changes in brightness. Detection methods can be optimized to specific variability types. 



ing, and other variable sources (e.g.. 


Al 


ard &; LuDton 
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2004 




Corwin et al. 


20061 lYuan & Akerlol 


2008^ 


Instead of detecting 



variability in images, recognizing variability in light curves can also be tailored to a par- 
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ticular type of vari ability signal such as transit (e.g.. iProtopapas. Jimenez, fc Alcockll2005 
Renner et allboosi ). 



We presented a new framework of detecting general variability types in massive time- 
series data by using a non-parametric i nfinite Gaussian mixture model (GMM) in our previ- 
ous paper (jShin. Sekora. fc Byunll2009l . hereafter, Paper I). In our approach, variable objects 
are considered as outliers from non- variable objects which should constitute a dominant frac- 
tion of given data, in mult i- dimensional space spanned by several variability indices. In the 
results fro m the infinite GMM where each group is described by a multivariate Gaussian dis- 
tribution ( lRobertlll996[ ). we recognize large groups as groups of non- variable obje cts, tagging 



outliers from these large groups as possible candidates of variable objects (see iLiad 12005 



for other methods of clustering). The strength of our approach is based on the assumption 
that non-variable objects, which do not have enough signals of variable phenomena in their 
light curves, represent the dominant fraction of data and share the same systematic effects 
hidden in the given data such as sampling patterns and noise properties. Therefore, by 
extracting common properties of dominant non-variable objects from the given data, our 
approach can be less biased than choosing specific types of variability with assumptions of 
systematic patterns. 

In this paper, the second of a series of papers , we apply our metho dology to all data 
from the Northern Sky Variability Survey (NSVS; IWozniak et al.ll2004al ). The NSVS cat- 
alog includes about 14 million^ objects with the optical magnitude ranging from 8 to 15.5 
and declinations higher than — 38 deg. These obj ects are so bright that deeper imaging 
surveys such as Pan-STARRS (IKaiser et al.l 120101 ) and Large Synoptic Survey Telescope 
( jTysonl 120021 ) cannot produce useful photometric data due to saturation in their normal 
observation modes. Therefore, variability analysis of the current NSVS data will be still 
useful for investigation of bright stars even after deeper images are acquired in the future 
surveys. Moreover, the NSVS data have not been fully exploited because several previ- 



ous trials have been focu sed on particu 



for variability signals (e.g..lWozniak et al 



Kinemuchi et al. 



20061 



Hoffman et al.ll2008 



2009|). 



ar types of variable objects with specific criteria 



2004b'; 'Nic holson. Sutherland, fc Sutherland 



Wils. Llovd. fc B ernhardl l2006l: Kellev fc Shaw 



Usat(^2008l : lDimitrov,2009l : iSchmidt et al.ll2009l : 



2007 



Kiss et al. 



2005'; 
20071: 



Hoffman. Harrison. &: McNamara 



Most NSVS objects are already included in other previous surveys. In particular, the 
NSVS catalog can cover most of the optical magnitude range of objects detected in IRAS 



^ Because some NSVS objects are included in multiple separate light curves from different observation 
fields, the total number of light curves is larger than that of objects. 
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( lClegg||l980l ). Two Micron All Sky Survey (2MASS: lKleinmanrull992l) . and Galaxy Evolution 
Explorer (GALEX; iBianchi fc The GALEX TeamI Il999l ) observations. Since colors can be 
used to identify the particular types of spectral energy distributions, associating variability 
analysis to colors can be an effective way to discover ne w variable sources at a specific 
phase of stellar evolution such as Mira-type variables (e.g., 



or extragalactic variable objects such as quasars (e.g., iBianchi et al. 



Poimans. 



d fc Maciejewskil 120051 ) 



20071 ). Associating the 



NSVS objects with other optical surveys is also critically important. Since the NSVS data 
do not have a high angular resolution with small telescopes, a large fraction of objects might 
be affected by blending, poor positioning, and incorrect identification of extended objects 
as stars in the NSVS data. Our new variability analysis of the whole NSVS data will help 
others find specific variable sources by combining their data and these other multi-wavelength 
surveys. 

This paper is organized as follows. In Section [21 we explain the application of the infinite 
GMM to the NSVS data. In Section |3l we describe how to extract variable candidates by 
using the results from the GMM. In Section HJ we explain the Web database of our variability 
analysis. Properties of variable candidates are explained in Section O considering their 
associations to archival data. Finally, summary and discussions are given in the last section. 



2. Method 



2.1. Data and variability indices 



We extract 16,189,040 hght curves from the NSVS data (IWozniak et al.ll2004al l by lim- 
iting samples that have more than 15 good photometric data points. The good photometric 
points are defined as not having bad photometry flags: SAT URATED, NOCORR, LONPTS, 



HISCAT, HICORR, HISIGCORR, and RADECFLIP (see IWozniak et al.ll2004aL for mean- 
ings of the flags). Since one object can be included as several light curves in different 
observation fields in the NSVS catalog, the number of objects is slightly smaller than that 
of the light curves in our sample. Here, we consider each light curve as a separate entity. 
Among 644 observation fields of the NSVS data, six fields (123c, 145c, 146c, 147c, 156c, 
156d) do not have any light curves having more than 15 good photometric data points. 

In addition to six variability indices used in Paper I, we also derive skewness and kur- 
tosis from each light curve. The definitions of these indices are summarized in Table [1] 
a/ II, 7i, and 72 are not sensitive to structure of light curves. However, estimating them is 
computationally cheap, describing simple low-order patterns of light curves easily. We note 
that our definitions of skewness and kurtosis is not the same as the traditional measures (see 
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Joanes &: Gil]||l998l . for a discussion). Other five indices describe more complex patterns in 



light curves. Con represents how many sets of three consecutive data points are at least 2 a 
fainter or brigh ter than the median magnitude, tracing continuous variations in light curves 



( iWozniak 



20001 



]). r] measures th e ratio of the mean square successive difference to the sample 



variance (jvon Neumann! Il94ll ). J and K have been commonly use d for multi-ba. nd light 



curves although these can be estimated for single-band light curves (IStetsonlll996l ). Here, 
we use only single-band light curves with a slightly modified definition which uses sequential 
pairs of data points. Finally, we also use the analysis of variance (ANOVA) statistic whic h 
is useful for identifying periodic signals ( ISchwarzenberg-Czernyl Il996l : IShin fc ByunI 120041 ). 
The maximum value of the ANOVA (AoVM) is used to measure the strength of periodicity. 
Even with an inc orrect period of the light curve, AoVM can be a valuable quantity that 
infers periodicity (IShin fc ByunI 120071 ). 



2.2. GMM and results 

We follow the same procedure of the infinite GMM as in our Paper I. The GMM is 
derived for each NSVS observation field with the eight variability indices. Even though 
the GMM converges much earlier than 100 iterations, we conduct 100 iterations as shown 
in Paper I. Since finding large groups is our main concern to find out non-variable objects 
from the samples, the number of iterations is not an important factor which affects how 
many small groups can be recovered. Each GMM componenl^l is described by a multivariate 
Gaussian distribution with its mean, i.e. center, and covariance matrix: 

1 1 

where m is an index of a mixture component, x = (cr//i, 71, 72, Con, 77, J, AoVM) is a 
vector of parameters, and D is the number of parameters (in our case D = 8). Furthermore, 
vector of mean values (i.e., mixture centers), and is the covariance matrix of the 
Gaussian distribution. 

Figure [U shows how many groups are recovered with the infinite GMM and how many 
light curves are included in large groups. Since a small number of observation epochs provide 
poor sampling of light curves and have a low probability of detecting variability, fields having 
few observation epochs (i.e., few observed frames) are expected to have a small number of 
groups, while large groups dominate the whole population of light curves. In fields with a 



^ We use component as the same term as duster and group in this paper. Component is a specific term 
for the GMM. 
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large number of light curves, the infinite GMM recovers many groups because these fields are 
likely to include various kinds of variable light curves. But the fraction of data included in 
the large groups does not change simply as a function of the number of light curves because 
the total number of light curves in the field does not affect the probability that a single light 
curve represents a non- variable object. 

We measure the Davies - Bouldin (DB) index in each field to check systematic differences 
of the GMM results. The DB in dex is commonly used to measure the compactness of clusters 



and separations among them (jPavies fc Bouldiru Il979l : IVendramin. Campello. fc Hruschka 



20101). The index is defined as 

DB 

where ric is the number of clusters, Sj is the average distance of data points included in the 
cluster i with respect to its center, and dij is the distance between centers of two clusters 
i and ?. Dista nce is defined as the norm which is one of the simple measurements (see 



-E (2) 



Yu et al.ll2006l . for a discussion). As each cluster is compact and well separated from others. 



this DB index decreases. 

Strong systematic differences of the DB index do not appear among the 638 NSVS fields 
as shown in Figure [2l In most fields, the groups in the GMM results have a DB index smaller 
than 10. We do not find any strong systematic dependence of the DB index on the number 
of frames and the number of light curves in each field. The highest DB index ~ 29 is found 
in the field 100c which has a small number of frames and light curves. Moreover, in that 
field the number fraction of the top three groups is highest among the sample fields. 



2.3. Largest cluster 

Center coordinates of the largest cluster show a significant variation for different fields. 
Since each NSVS field has different characteristics, it is not surprising to see this variation 
shown in Figure [3l Importantly, 71 and 72 are not close to zero in contrast to the expectation 
from assuming a normal distribution for light curves. These non-zero 71 and 72 imply that 
a dominant fraction of light curves are not well described by a normal distribution due to 
sampling or systematic observational effects. The same implication is also found by 7] which 



has an expected average ~ 2 for a normal distribution (IWilliamsl Il94l[ l. Figure [3] proves 
the importance of including several variability indices which catch different features of light 
curves with different sensitivities as we showed in Paper I. 



We also examine covariance matrices of the largest cluster in each field. Comparing 
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the absolute values of the covariance matrix elements, the variances of J or AoVM have 
maximum values in the largest cluster of all 638 fields. In 604 fields, the variances of J range 
from about 20 to 800. But the variances of AoVM in the rest of the fields range from about 
20 to 400. The largest clusters in the rest 34 fields also have a higher central value of AoVM 
than other fields generally. In the covariance matrix of the largest cluster, the covariance 
between rj and J has the smallest negative value from about —6 to 0. Only minor fraction of 
fields show the smallest negative covariance between rj and AoVM. As shown in Paper I, a 
substantial number of light curves exhibit negative correlations between r] and J, or between 
rj and AoVM in the NSVS data, implying systematic properties of the NSVS data. 



3. Separation of variable candidates 

One disadvantage of clustering algorithms is that there is no useful validation process 
for clustering results as in other unsupervised learning methods. Our approach using the 
infinite GMM also shares this problem with other clustering methods. Therefore, there is no 
one absolutely right way to define a boundary between variable and non- variable objects in 
multi- dimensional space with the clustering results. In many situations, the selection rule of 
variable objects can be limited by practical issues such as the number of objects which can 
be investigated further in follow-up studies. Here, we suggest three possible ways to separate 
variable objects from non-variable objects by using clustering results. 

As suggested in Paper I, first, we can select as variable candidate objects which are not 
included in large groups %. Figure [1] shows that the largest group in each field includes a 
different number of objects. Therefore, defining objects in the largest group as non-variable 
objects produce different numbers of variable candidates in each field. Moreover, if there 
were any systematic patterns in light curves, or if the majority of non-variable light curves 
is not well described by a multivariate Gaussian distribution, data in each field would make 
the second or third large groups include non-variable objects or objects affected by the 
systematic patterns in their light curves. Basically, the existence of small clusters, i.e., the 
exact number of clusters, is not an important factor to select variable candidates. 

Figure H] presents the change of cumulative fractions of data included in groups as the 
number of large groups increases. Since we find only six clusters in the field 147d, the fraction 
of data included in the top four groups is about 99%. However, the median fraction of data 
included in the top four large groups is about 89%. To select variable candidates from minor 



■^We warn that this approach does not work if the most dominant fraction of hght curves corresponds to 
variable sources. 
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groups in each field, one could adopt a 90% cut of the cumulative fraction and avoid light 
curves included in large groups. 

Another possible approach is using distances of objects from the center of the largest 
group in the eight-dimensional space spanned by the variability indices. In Paper I, we 
introduced the Mahalanobis distance from the largest cluster: 



D 



M 



Mo J 



(3) 



where the center hq and covariance matrix Sq of the largest cluster are used with the position 
of an individual object x in the eight- dimensional space. Unlike a commonly used Euclidean 
distance, Dm depends on Sp which describes how broadly the objects in the largest cluster 
disperse ( ISharma fc JohnstonI |2009| ). Objects with large distances can be considered as 
variable obj ects because they have strong dis similarity from the dominant fraction of light 
curves (e.g., IJolion. Meer. fc BataouchdllQQll ). In Figure [5l the distribution of Dm for all 
objects with respect to the largest cluster in each field shows a peak of the distribution around 
Dm ~ 2 correspondi ng to the mode value of the beta distribution which is expected for the 
distribution of Dm (IVerveridis &: Kotropoulosll2008l ). It also shows another concentration 
of objects between Dm ~ 10^ and 10^, implying the possible signature of a large number 
of variable candidates or any systematic patterns hidden in the NSVS light curves. This 
feature varies strongly in each field as shown for the fields 088d and 147d in Figure 

We can derive the cut of Dm that includes a specific fraction of members in the largest 
cluster by integrating the multivariate Gaussian distribution with the largest cluster's center 
fiQ and covariance matrix Sq. For example, the b% cut of Dm can be found with 



p(x|/io, 5]o)dx = b. 



(4) 



x:DM(x)<DS^t 



where p(x) is a multivariate Gaussian dist ribution. Practically, the integ ration can be esti- 
mated by using the Monte Carlo method (jChen. Morris, fc Martini l2006l ). Each field has a 
different value of D^"* for the same probability cut. For example, the 99% cut in the field 
088d is D^* = 5.634, while D^* = 5.105 corresponds to the 99% cut in the field 147d. 
Since the largest cluster has the sharper concentration of Dm in the field 147d than in the 
field 088d (see Figure [5]), D^f the field 147d is smaller than that in the field 088d. 

These two approaches have their own different problems. Avoiding large groups does 
not guarantee that minor groups are well separated from the largest group. In a multi- 
dimensional space, a few minor groups can be close to the large groups with a small Dm- 
When selecting objects with Dm > Dm^ as variable candidates, those objects with a large Dm 
can form large groups representing systematic observational features in each field. Although 
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Figure [5] implies that members of most large groups have a small Dm-, using only one method 
can cause more contamination of non-variable objects or objects dominated by systematic 
patterns in each observation field. 

We can define variable candidates conservatively by combining both methods. For 
example, we find objects having the larger Dm than the D^^* of the 99% cut and not 
being included in the top four large groups. Figure [6] represents the number fraction of 
objects selected by this conservative method compared to results of selecting objects with 
the of the 99% cut. We use this conservative selection of variable candidates in Section 
O Because the top four large groups are closely located in the eight- dimensional space 
generally, excluding the members of the top four large groups from the variable candidates 
usually determines the size of variable candidates. 



Database 



We provide the results of our variability analysis and clustering for all sample light 
curves online. The database stores the eight variability indices, the cluster identification 
number in each field. Dm from the largest cluster for every light curve as well as the basic 
information of the light curves such as the NSVS object id and coordinates. The database 
is supplemented with the number of light curves analyzed, the number of groups found by 
clustering, the number fractions of group members, and the Dm cuts of 99%, 95%, and 
90% cuts from the largest group in each field. Therefore, users of the database can select 
variable candidates by using these clustering results with their own selection rules of variable 
candidates. 

The database is also supplemented by association to other astronomical catalogs. All 
objects analyzed are cross-matched to the SIMBAD database, 2MASS All-Sky Catalogs 
of Point Sources (ISkrutskie et al.l 120061) . the photome tric catalog of the Sloan Digital Sky 



Survey (SDSS) Data Release 7 (|A 



jazaiian et al.ll2009h. and the photometric catalog of the 



GALEX GR4/GR5 Data Release flMorrissev et all 120071 ) with the NSVS coordinates and a 
search radius of 6". When multiple objects are matched within the search radius, the nearest 
match is associated with the NSVS object. In additi on to these catalogs, we also present 



matching results o f the NSVS coordinat es to IRAS (IHelou fc Walker! 1 19881 : iMoshir et al. 



19901 ) and AKARI (IMurakami et al.ll2007l ) catalogs as separate files online. This association 
to other catalogs can be used to identify the morphology of the NSVS objects as galaxies or 
stars, to find astronomical types of known objects, to check blending effects with neighbor 
objects, and to estimate colors of objects as we show in the following section. In particular, 
because the spatial resolution of the NSVS data is much worse than the SDSS and 2MASS 
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catalogs, the morphological information from these catalogs can help database users to avoid 
blended objects and less precise photometry in the NSVS catalog. 

The database can be accessed through the Web interfac^. Searching the light curves 
and their variability analysis is possible with equatorial coordinates and a search radius given 
by users. In particular, the simple cone sea rch interfac j§ is also provided for compatibility 
with the Virtual Observatory environment (IWilliams et al.ll2008l ). We plan to provide the 
basic components of the database in Vizieij^ too. 



5. Properties of variable candidates 



In this section, we examine properties of variable candidates which are not included in 
the top four groups and have Dm larger than the 99% cut. With these conservative selection 
criteria, we find 1,840,310 light curves as possible variable candidates. We emphasize that this 
selection of variable candidates is highly conservative. When we select variable candidates 
simply with Dm larger than the 95% or 99% cut, the total number of variable candidates 
is 6,640,387 or 5,826,587, which is about 3.6 or 3.1 times more than the number of variable 
candidates selected with the conservative definition, respectively. Meanwhile, when we select 
objects not included in the top four groups as variable candidates, the number of candidates 
is 1,918,580, which is similar to the result of the conservative selection. But we remind that 
this selection of variable candidates is still affected by the intrinsic limits of the NSVS data 
such as blending effects and completely different uncertainty properties of photometry for 
extended objects compared to stellar objects. 



5.1. Known objects and new candidates 

We search all NSVS samples with the SIMBAD database to recognize any known objects 
and their basic properties such as well-known names. As explained in Section 4, the nearest 
SIMBAD object around the coordinates of the NSVS objects is retrieved with a search radius 
of 6". Since the NSVS data do not have information about morphological classification such 
as galaxy and star, the auxiliary information of the SIMBAD database is useful to sort out 
spurious variability of extended objects. About 6% of the NSVS objects are matched with 



|http : //stardb . yonsei . ac . kr 

" http : //stardb . yonsei . ac . kr/ conesecirch./nsvs_conesearch . php 



http : //vizier . u-strasbg . f r/viz-bin/VizieR 
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at least one SIMBAD object. But this fraction is affected by the precision of the NSVS 
coordinates. 

Based on the object classification given in the SIMBAD databasfl we find that 16,061 
NSVS light curves correspond to known or suspected variable stars. Considering only vari- 
able candidates with our conservative selection criteria, the number of known or suspected 
variable stars is 11,080 among our variable candidates. Finally, excluding known or sus- 
pected variable stars as well as known galaxies in the SIMBAD database, the number of new 
variable candidates is 1,824,123, with our conservative selection. Hereafter, our investigation 
of variable candidates is limited to these 1,824,123 NSVS objects. We note that uncertain 
coordinates in either NSVS or the SIMBAD databases can cause us to miss some known 
galaxies and known or suspected variables. The classifications in the SIMBAD databases 
might not be as complete as o ther catalogs of variable stars such as the AAVSO International 
Variable Star Index WatsonI 120061 ) as we discuss in Appendix. 



Figure [7] shows light curves of six example NSVS objects which are not included in our 
conservative selection of variable candidates, but which are known or suspected variables in 
the SIMBAD database. These examples are among 11 objects, which are not included in 
our candidates, in the NSVS field 112a corresponding to a part of the constellation Aquila. 
If we selected variable candid ates as objects with Dm l arger than the 95% cut, four objects. 



inclu ding GSC 00490-04680 f lBernhard fc Llovdl 12000) and NSV 12564 flKinnunen fc Skifi 



20001 ) in Figure [3 would be included in variable candidates among the 11 objects. The field 
064d misses the largest number of known or suspected variable stars (236 objects) with our 
conservative selection of candidates. However, its fraction is only 0.6% with respect to the 
total number of objects that are analyzed by our clustering method. We note that most these 
missing objects are identifie d as variable objects in the Kepler field by HATNET which uses 



image subtraction method ( iHartman et al.ll2004j ) 



Figure IHpresents example light curves of known or suspected variable stars matched with 
the SIMBAD database and included in our conservative selection of variable candidates for 
the NSVS field 112a. We find 88 known or suspected variable stars among our 6,389 variable 
candidates in the field 112a, corresponding to about 1%. 

In Figure [H we show 12 examples of new variable candidates which are matched 
to any kind of known objects except variable stars in the SIMBAD database. CCDM 
J19302-I-0219AB is not typed as variable stars in the SIMBAD database. But this ob- 



http : //simbad . u-strasbg . f r/simbad/ siiii-display?data=otypes 



http : // www . aavso . org/ vsx/ 
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ject is a known system of double stars (jPommanget fc Nyslll994j ) which might have been 
affected by blending in the NSVS data. 2MAS S J19391065+0543500 is al so not included as 
a variable star in the SIMBAD database. But Usatov &: Nosulchik J2008h suggest that this 
object is a red asymptotic giant branch (AGB) variable star, supporting that the light curve 
shown in Figure [9] exhibits true light variation. These examples show that our conservative 
selection of variable candidates can catch real variable objects. 

We also compare our variable candidates to those found by ot hers using the NSVS data. 
For e xample, 785 RR Lyrae candidates were reported already (IWils. Lloyd, fc Bernhard 
20061 ). These 781 objects are included in our conservatively selected candidates. But all 
785 objects are found when we select variable candidates as objects having larger than 
the 99% Dm cut. When we select variable c andidates with the 9 5% Dm cut, all new [3 



Lyrae and Algol-type variable candidates from iHoffman et al.l (|2008[ ) are recovered with our 
method if they are included in our original NSVS samples. But we recover 95% of them 
with the conservative selection of variable candidates. About 4700 variable candidates of 
other kinds such as 5 Scuti stars and Cephe id objects were also studied with the NSVS 
data (IHoffman. Harrison, fc McNamarall2009l ). Again, our selection with the 95% Dm cut 
recovers most variable candidates except for objects which have different light curves due 
to diffe rent definitions of good photometric da ta and different systematic patterns in light 
curves ( IHoffman. Harrison. &: McNamaral l2009[ ) . These other studies use simple rules such 
as 0.1 mag dispersion of light curves, which are conservative selection methods for specific 
types of variable candidates. Therefore, the number of variable candidates is much larger in 
our approach than in others. 



5.2. IRAS sources 



For the conservative selection of variable candidates, we find IRAS sources which are 
spatially match ed to the NSVS coordi nates with a search radius of 6" in th e IRAS point 
sourc e catalog (IHelou &: Walkerl Il988l ) and the IRAS faint source catalog (IMoshir et al. 
19901). Among all NSVS samples, 31,852 light curves have the matching IRAS sources. 
Considering the IRAS sources only for our conservative variable candidates, the number is 
12,987, which is about 41%. 



We derive two colors of the IRAS sources by using the IRAS photometric fiux at 12, 25, 
and 60 p^m (Fi2;,,m, -Fa^/^rn, and Fmiim)- The conventional definition of the IRAS co lors (e.g.. 



van der Veen fc Habing||l988l : lOlivier. Whitelock. fc Marang||200ll : ISevensterl 120021 ) is 



C 



12/25 



2.51og( 



25^111 ■ 
^ 12/xm 



25/60 



2.51og( 



^25/jm 



(5) 
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where we do not apply any color corrections to the fluxes. 



The IRAS colors have been commonly used for classification of infrared sources. In 
particular, the two-color diagram like Figure [TD] helps us understand what kind of variable 
candidates show variability which is releva nt to the late stage of stellar evolution such as 



AGB stars with evolved circumstellar dust ( ZuckermanI 119871 : Ivan der Veen fc Habind Il988 



Kwok. Volk. fc BidelmanI 119971 : iRamos-Larios et al.l 120091 ) . In Figure [101 we plot the colors 
of IRAS sources with t he quality number Q = 3 at all three wavelengths 12, 25, and 60 /xm 
(IHelou fc Walked Il988l ). AGB stars dominate colors of C25/60 < —0.3, while planetary neb- 
ulae and young stellar objects d ominate —0.3 < C25/60 < 0.4 and 0.4 < C25/6O5 respectively 
( jjackson. Ivezic. fc Knappll2002[ ). 



In Figure [TTl we show example light curves of variable candidates that have the corre- 
sponding IRAS identifications. Although these IRAS sources are not known variable stars 
in the SIMBAD database, some of them have been investigated in various ways without 
variability information. IRAS 03534+6945 ( NSVS 51353 6 ) and IRA S 23400+6320 (NSVS 
14872 99) were found as H-a emitting stars in IStephensonI ( 1l986l ) and ICoyne fc MacConnell 
( 1l983l ). respectively. These sources are also found as a possible variable sources in the VSX 
catalog (see Appendix). IRAS 17203-1534 (NSVS 16483061) and IRAS 01005+7910 (NSVS 
262162) are post- AGB s tars which are sub-c lassified as hot post- AGB stars and high galac- 
tic latitude supergiants ( ISzczerba et al.ll2007l ). respectively. IRAS 01005+79 10 has also been 



obser ved with the Hubble Space Telescope which found nebulae around it ( ISiodmiak et al. 
2008h . 



5.3. AKARI sources 



Bright objects included in this paper are expected to be included in AKARI observations 
which have be en conducted with two instruments Far-Infrared Surveyor (FIS) and Infrared 
Camera (IRC; iMurakami et al.l 120071 ). Both instruments produ ced all-sky source catalogs 
which are much deeper and spatially better resolved than IRAS (lOyabu et al.ll2010[ ). 



Ishihara et al. 



We match our NSV S samples to AKARI/IRC All-Sky Point Source Catalog (Version 



2010a 



Yamamura et al. 



b) and AKARI/FIS All-Sky Survey Bright Source Catalog (Version 
2OIOI ). with a search radius of 6". The numbers of our NSVS samples 



1.0; 
1.0; 

matching to the AKARI objects are 267,732 and 8,742 for the IRC and FIS catalogs, re- 
spectively. This matching rate with the IRC catalog is much higher than that with the 
IRAS 



Figure [T^ shows a color - color diagram of AKARI fluxes like Figure [TUl In the plot. 
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we show objects as identified as point sources with only good photometric observations of 
AKARI IRC at 9 and 18 /im, and FIS at 65 fim, which are found with IRC photometric 
flags of conditions q_S09 = 3, f09 = 0, X09 = 0, q_S18 = 3, fl8 = 0, X18 = 0, and with 
FIS photometric flags of conditions q_S65 > 1. The number of the NSVS objects with the 
good photometric data is 374, while only 54 AKARI point-source objects with good photo- 
metric data correspond to the variable candidates selected conservatively. The distribution 
of AKARI colors is similar to that of IRAS colors because of the similar wavelength ranges 
of the observation bands. 

As we find with the IRAS colors, the vari able candidates with the corresponding AKARI 



objects might be long-period late-type stars (llta et al.ll2010l ). Figure [13] shows example light 



curves of the variable and non-variable candidates in the SIMBAD database. AKARI IRC 
200011367 (NSVS 1713088) corresponds TYC 3668-112-1, which is also an IRAS object. 
Even though this light curve does not have many observed data points, the light curve is 
selected as a possible variable candidate in the NSVS field 013b. AKARI IRC 200843875 
(NSVS 3393876) is close to IRAS 22164+6427, which might be the same object. The light 
curve of AKARI IRC 200752804 does not have many observed data points. But its variation 
seems reasonable because of the fact that it is a post-AGB star or a protoplanetary nebula, 
corresponding to HD 331319 and IRAS 19475+3119. The light curves of the other three 
objects presented in Figure [13] do not correspond to any known objects in the SIMBAD 
database, and are not selected as variable objects. 



5.4. 2MASS and SDSS photometry 



Near-infrared (NIR) colors are also commonly used to identify basic properties of stars 
and to separate non-stellar objects such as qu asars. We ma tch all variable candidates to 



the 2MASS All-Sky Catalog of Point Sources (ISkrutskie et al.ll2006[ ) with a search radius 
6". A total of 1,439,381 variable candidates have corresponding 2MASS sources. Hereafter, 
2MASS photometry data are given in Vega system. 

Figure [14] shows colors of the matched 2MASS objects with unblended, unsaturated, 
and accurate photometry which is described by read flag Rfig = 2, blend flag E lfig = 1, and 
conta mination and confusion flag Cfig = in all three bands of 2MASS data (jCovey et al. 



20071 ). and with the separation between the NSVS and 2MASS positions less than 1". Most 
variable candidates have color s similar to those o f normal stars in ou r Galaxy which are 
mainly < J — H < 1 (e.g., iFinlator et al.l l2000t IZoccali et al.ll2003l ). Although quasars 



have distinctive colors m J — H versus H — Kf, where most variable candidates are not found 



see 



Chiu et al.ll2007l ; iKouzuma fc Yamaokall2010l . for discussions), some fraction of variable 
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candidates might be quasars with colors around J — i7 ~ 0.9 and H — Ks ^ 0.3. 



Late-type variable candidates such as red giant branch (RGB) and AGB stars can be 
identified more reliably with the 2MASS colors. As shown in Figure HM. the domin ant 
NIR color of pulsating variable stars found in the Magellanic clouds (jita et al.l 120041 ) is 
different from the major colors of our variable candidates and known quasars. In particular. 



J 



1.4 is the boundary between oxygen-rich and carbon stars ( iNikolaev fc Weinberg 



2000l : ICole fc Weinberg]|2002l : iKiss fc Bedding]|2003l ). Therefore, our variable candid ates with 



color s similar to those of known va riables might be pulsating RGB and AGB stars (jita et al. 
20041 : iKouzuma fc Yamaokalboooh . 



By using the NIR color - color diagram given in Figure [Ml we can also investigate 
whether our variable candidates include possible obscured young stars with or without disks 



such as T Tauri stars ( iMeyer. Calvet. k. Hillenbrandlll997t iTsujimoto et al.ll2002l : lOzawa. Grosso. fc Montm 
20051 ). If reddening is significant in some of our variable candidates, their colors might be 
consistent with those of young stars. 

Figure Uni shows six example light curves of the NSVS objects which have corresponding 
2MASS measur ements. 2MASS 1 8552297+0404353 (NSVS 13924374) is a Herbig Ae/Be 
candidate star (jVieira et al.l |2003[ ) with a different name PDS 551. The so urce 2MASS 



09322 353+1146033 (NSVS 10229563 and IRAS 09296+1159) is a post- AGB star flBlommaert. van der Veen. 
19931 ). We find that its variation recorded in the NSVS light curve is regular with a period of 
about 46.88 dayfl 2MASS 2223012 0+2216565 (NSVS 11767619) is con firmed as a carbon 
star in spectroscopic observations by lMauron. Gigoyan. fc Kendalll ( 120071 ). But other objects 
presented in Figure [15] have not been assigned a type. 



We also match the NSVS coordinates of the variable candidates to the SDSS Data 
Release 7 with 6" search radius. The five bands of the SDSS photometric systems have 
been comm only used to identify stellar ari d non-stellar sources by using their distinctive 



colors (e.g., lFanlll999l : iFukugita et al.l [201 if ). Combining the 2MASS photon aetry with the 



SDSS photometry also helps us determine stellar source types precisely (e.g.. [Finlator et al. 
20001). Among 438,087 variable candidates having corresponding SDSS photometric objects, 
406,564 candidates have also corresponding 2MASS sources. 



Figure [16] presents the distribution of colors for the variable candidates. We plot only 
good SDSS photometric data and clean 2MASS photometric data as we explained earlier. In 
particular, the matched objects with less than 1" distance are shown in the plot. The good 



This period is fo und by using the tool provided in http: //www. astro . Isa.umi ch.edu/ -msshin/science/code/MultiSt 
(IShin fc Bvunll2004l) . 
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SDSS photometric data are defined as stellar (i.e., unresolved) objects without the SDSS pho- 
tometric flags EDGE, BLENDED, PEAKCENTER, NOPROFILE, COSMIC_RAY, SATU- 
RATED, NOTCHECKED, DEBLENDED_AS_MO VING, SATUR_CENTE R, INTER? _CENTER, 
DEBLEND_NOPEAK, and PSF_FLUXJNTERP Jstoughton et al.ll2002h . We check these 
flags in each SDSS band. Therefore, the number of objects shown in each panel of Fig- 
ure [16] varies for different color combinations. In all SDSS photometric data, we use PSF 
magnitudes. We also find SDSS objects with limiting magnitudes = 22.3, nig = 23.3, 
rrir = 23.1, 



22.3, and = 20.8. 



The SDSS color - color diagram can be used to pick out probable RR Lyrae vari- 
ables which are pulsating horizontal branch stars (iGautschy fc Said Il996[ ). As suggested 
i n the theoretical pre diction of colors for RR Lyrae stars in the SDSS photometric system 
( Marconi et al.l 120061) . the following color ranges can be used to find RR Lyrae candidates 
feesar et al.lboioh : 

0.75 <u-g < 1.45, (6) 
-0.25 < ^-r < 0.4, (7) 
-0.2 < r - z < 0.2, (8) 
-0.3 < i - 2 < 0.3, (9) 



which are shown as boxes in Figure [161 A large number of variable candidates are identified 
as F-, G-, and K-type stars in the figure. 

Among the variable candidates selected by the RR Lyrae color cuts, we present ex- 
ample light curves of two objects, SDSS J105513. 79+564747.5 (NSVS 2594623) and SDSS 
J145313.21+421031.8 (NSVS 5152328), which have been also observed in the SDSS spec- 
troscopy, in Figure [T71 Both objects are not classified as variable sources in the SIM- 
BAD database. However, SDSS J105513. 79+564747.5 is found variable in GALEX ob- 



servations (IWelsh et al.l l2005l ) as included in the VSX catalog (see Appendix). We can 
estimate approximate periods of these two variables with the NSVS light curves as 0.541757 
and 0.489448 days, respectively. These examples clearly show that a low cadence in the 
NSVS data is not high en ough to derive complete l ight curves of these RR Lyrae variables, 
which have short periods (jSterken fc Jaschekll2005l ). except few NSVS objects with enough 
data (IKinemuchi et al.ll2006l ). Therefore, further follow-up observations of interesting NSVS 
objects will be required to confirm their v ariability classes. SDSS J10 5 513. 79 +564747.5 
was identified as a probable RR Lyrae by IWheatley. Welsh. &: Brownd (12008! ) . although 
they could not retrieve a complete light curve from their GALEX observations. SDSS 
J145313. 21+ 421031. 8 was als o recognized as a blue horizontal branch star in the SDSS 
observation (jSirko et al.ll2004l ). These examples prove that using colors of objects is comple- 
mentary to variability analysis to identify object types. 
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Six other examples of variable candidates are presented in Figure [18] where NSVS ob- 
jects have corresponding reliable SDSS and 2MASS photometric data. Except for SDSS 
J021532. 23-104029. 3, these objects have {g — i) colors without the SDSS bad photometric 
flags. Spectral types of stars can be described approximately by {g — i) colors where BO, 

AO, FO, GO, KO, and MO corre spond to {g - i) 0.94, -0.44, 0.09, 0.52, 0.83, and 1.95, 

respectively (ICovey et al.ll2007l ). Most variable candidates are close to G5 as shown in Figure 
[T6l Although (g — i) is generally a good proxy of spectral types, SDSS J155325. 80-1-530924. 2 
(NSVS 5206326) was already confirmed spectroscopically as M8 III star for its {g — i) = 
5.81. We also note that this object is also included in the VSX catalog as a variable star 
(see Appendix). These examples reassert that the low cadence in the NSVS data does not 
guarantee a certain classification of variable objects, requiring further follow-up observations 
with different cadences. 



5.5. SDSS and GALEX photometry 



Hot stellar objects such as white dwarfs and massive main-sequence stars are generally 
not detected in NIR, but they can be more easily recognized over UV wavelength ranges 
which cover most stellar flux. In the GALEX GR4/fl we find objects matching our NSVS 
variable candidates' coordinates within 6". When multiple GALEX objects are matched to 
a single NSVS object, we choose the nearest GALEX object as the best match. A total 
of 739,625 variable candidates, i.e., about 40% of the candidates, correspond to GALEX 
photometric objects. A total of 286,185 candidates have corresponding SDSS objects too. 

In Figure [T^ we present colors of variable candidates with reliable GALEX and SDSS 
photometric measurements. In addition to following the same conditions for the reliable 
SDSS photometric measurements as in Section 15. 4[ we require that the GALEX objects 
should have the distance from the center of the GALEX field of view < 0.°6, and both 

The FWHM 



FUV and NUV magnitudes < 25 flAgiieros et al.l 120051: iMaxted et al 



angular resolution is about 6" in the NUV channel (IMorrissey et al. 



2009) . 



20051 ) . Considering 



the combined effects of poor spatial resolution in the NSVS and GALEX data, the spatial 
association among the NSVS, SDSS, and GALEX oh]ects needs a careful check when people 
select interesting NSVS objects with the corresponding SDSS and GALEX objects together 
in our database. 

Colors of most variable candidates presented in Figure [T9] are consist ent with the ex- 



pected colors of normal stars (ISeibert et al.ll2005l : iBianchi et al 



20071 . 120091 ). considering the 



http : //galex . stsci . edu] 
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Galactic extinction th at makes overall colors red. The color distribution of known quasars 
( iTrammell et al.l 120071 ) is well separated from that of stars in the diagram of (FUV — g) 
and {g — i). Because our variable candidates are bright objects, all of them might not 
be quasars but stars even though some objects seem to have quasar-like colors. Hot white 
dwarf candidates can be selected with the color cut of (FUV — NUV) < and (g — r) < —0.2 
(lAgiieros et al.ll2005l ). But we warn that the multiple matches of the NSVS coordinates to 
both SDSS and GALEX catalogs have worse precision than a single match to either SDSS or 
G^Li^X catalogs. Therefore, the color combining both SDSS and GALEX photometric data 
might not be reliable when the objects are faint or close to neighboring objects in the SDSS 
and GALEX catalogs. Since the precision of the SDSS objects' coordinates is much better 
than those of the GALEX catalogs, the SDSS colors are more reliable than the GALEX 
colors in the color - color diagram combining both catalogs. 

Among the NSVS objects with the corresponding SDSS and GALEX objects, several 
SIMBAD objects are found with further information about their properties. For example, 
NSVS 7609761 corresponds to SDSS J115800.38+295731.4 and GALEX J1158 00.4+295731 
with (FUV — NUV) = 1.63 and {g — i) = —0.55. This object is included in iBrown et al. 



(120081 ) as CHSS 835 which is a star with a spectral type of B8. In the diagram of (FUV — NUV) 
and (NUV - r), NSVS 48 1 9428 is known a s a spectral type B subdwarf FBS 0839+399 
JWegner fc McMahanI Egssi : iMickaehanI [20081 ) with (FUV - NUV) = -0.35 and (NUV - r) 
= —1.28. The light curves of these two objects given in Figure [20] do not show distinc- 
tive features due to the poor sampling rate in the NSVS data. Flare-like variation can be 
presumed from the light curve of NSVS 2744942 whic h is recognized as an ac tive M dwarf, 
corresponding to an X-ray object RX J1447.2+5701 jMochnacki et al.ll2002[ l. The NSVS 
light curve of this object shows about 1.5 mag variation even with the poor sampling rate. 



Figure [20] also shows the light curves of the three variable candidates which have reliable 
data of the SDSS g— and i-band photometry as well as the GALEX NUV measurement. None 
of them have any different identification in SIMBAD database. The light variation seems 
real, but the low cadence in the NSVS data does not produce complete light curves with 
distinctive types. 



6. Summary and discussion 

A new systematic investigation of variable candidates in the NSVS data was presented 
with a clustering method for time-series data. Assuming that the dominant fraction of light 
curves represents non-variable objects, our method finds clusters of light curves with their 
eight dimensional features, and then finds how many light curves are included in each cluster. 
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When choosing as variable candidate objects which are not included in the top four large 
clusters and which have Dm > 99% cut from the largest cluster, the total number of new 
variable candidate light curves is 1,824,123 in our entire sample of the NSVS data. 

The cross-correlation with IRAS, AKARI, 2MASS, SDSS, and (^^-LEX c atalogs helps us 



to ide ntify interesting objects with specific spectral types or variability classes (lEyer fc Mowlavi 



20081 ). In particular, variable stars over the instability strip can be selected easily from their 
specific colors (e.g., see Figure [T7|) . We also show examples of long-period variables which 
can be selected from their IRAS, AKARI, or 2MASS colors (e.g., see Figures [TT| [T^ and 
USD. 



Our analysis is presented online with the information on cross-correlations with other 
catalogs. Because the sampling pattern in the NSVS data is not good enough to identify 
detailed structures of different variability types, follow-up observations of variable candidates 
will be necessary to understand these variable sources by taking more photometric data 
points and improving the sampling rates. Moreover, some variable candidates such as IRAS 
sources might be new maser sources which are interesting objects in the radio region. 

Our approach of detecting variable candidates in the NSVS data is supplementary to 
previous methods of finding variable candidates. We do not claim that this method is the best 
way in all cases. Definitely, if observation systems, including instruments, environments, and 
data reduction procedures, are well known prior or are well controlled, supervised methods 
can be superior than unsupervised methods like our approach because supervised methods 
can simulate observing systems with known variable and non-variables sources to find the 
best separation between variable and non-variable sources. This separation can be applied 
to detect new variable candidates in the test data. Therefore, when the data properties, 
including all kinds of systematic patterns and real variability patterns, are understood and 
modeled well, detecting variable sources becomes a classification problem instead of a clus- 
tering problem. 

Our analysis results can be used with many different methods of selecting variable 
candidates. In this paper, our conservative selection method is avoiding the top four large 
clusters and objects with Dm < 99%. However, when people are interested in infrared 
variable sources, they can choose as variable candidate objects corresponding to IRAS objects 
with Dm > 90%. If known variable objects can produce clusters with reasonable sizes, 
finding clusters with many known variable objects can be an efficient way to find variable 
candidates. Unfortunately, this approach is not feasible now because the number of known 
variable objects is too small in each NSVS observation field. 

Several variability surveys cover the same apparent magnitude ranges as the NSVS 
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does. A large fraction o f sky has been alre ady observed in th e All Sky Automated Survey 
(ASAS: |Pojmanskilll997l ) and SuperWASP (IStreet et al.ll2003l ). Our approach of variability 
detection can be applied to those data sets too. In addition, objects included in both the 
NSVS and others can be combined to extend the span of time-series data or to complement 
different sampling patterns. We plan to update the online database with these additional 
data sets in the future. Furthermore, because almost all objects inc luded in our analysis 
will be monitored by the GAIA mission for five years ( ICacciaril 120091 ). our analysis will be 
combined with the future time-series data and astrometric/kinematic information. 

Our method can also be improved to catch much broader types of variable objects and 
objects with weak variability signals. For this purpose, it is important to include various 
features of light curves as we emphasized in Section [21 In particular, the usage of AoVM 
as one feature of light curves is strongly limited because a complete form of a periodogram 
has more information of light curves. Therefore, it must be useful to develop new features 
describing periodograms more completely if the ne w features can be estimated in compu- 
tationally cheap ways. Moreover, Stetson's I index IStetsonI (Il996l ) can be included in our 
method if data include multi-band light curves. For instance, J and K can be derived in 
each band, while I is estimated with multi-band light curves. Our usage of J can also be 
changed to use all pairs of data points instead of using sequential pairs. 
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A. Known or suspected variables found in the VSX 



The SIMBAD database is commonly used to identify known objects. We also use the 
database to find all known variable objects and extragalactic objects as shown in Section 
15.11 However, the SIMBAD database is not as complete as other c atalogs of var iable stars. 
In particular, the AAVSO International Variable Star Index (VSX) (jWatsonll2006l ) catalog is 
frequently updated with new reports of variable stars. 

We check how many variable candidates selected by our conservative selection are not 
classified as variable sources in the SIMBAD database, but are included in the VSX catalog. 
Here, we use the catalog released online on Nov. 15, 2009, including 178,599 stars. Among 
1,824,123 variable candidates chosen by our conservative selection and not included in the 
SIMBAD database, we find that 41,019 objects are included as variable stars in the VSX 
catalog. These known variable stars in the catalog ar e largely included with references to the 
ASAS observations (9,999 objects) (|Pojmanskilll997l ) . and to the NSVS observations (15,958 
objects) that we also use here. A total of 1,147 objects are matched to suspected variable 
stars in the catalog. Interestingly, a total of 1,106 suspected variables are included with 
references to the NSVS light curves. Therefore, at least 1,783,104 variable candidates are 
newly selected in our method. In the online database, we provide links to the most recent 
VSX catalog for the NSVS light curves which we examine. 
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Fig. 1. — GMM results with respect to the number of observation epochs (left) and the 
number of light curves (right). Prom top to bottom, each panel shows the number of groups, 
the number fraction of the largest group, and the number fraction of the top three large 
groups, respectively. 
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Fig. 2. — Distributions of the DB index. When the DB index is low, groups are compact 
and well separated from others. The NSVS field 100c shows the highest DB index ~ 29. 
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Fig. 3. — Center coordinates of the largest cluster. Since Con is 1 in all fields, we do not 
show its distribution here. The distribution shows that there are field-by-field variations of 
systematic effects which produce variations of the largest cluster's central position in the 
eight-dimensional space. 
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Number of groups in the order of decreasing size 

Fig. 4. — Cumulative fraction of objects included in groups. The median fraction as a 
function of the number of groups is presented with the solid line, while dashed lines describe 
the minimum and maximum fractions. For example, top four large groups explain about 
89% of light curves as the median fraction. The median fraction of about 99% is found with 
top ten large groups. 
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Fig. 5. — Distributions of Dm- The peak of the distributions ar ound Dm ^ 2 corresponds 
to a mode value of Dm for a multivariate normal distribution ( IVerveridis &: Kotropoulos 
20081 ). Compared with the distribution for all objects (solid line), the field 147d (dotted 
line), which has the smallest number of light curves, and the field 088d (dashed line), which 
has the largest number of light curves, in our samples are more dominated by objects around 
the largest cluster in the eight-dimensional space. 
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Fig. 6. — Fraction of objects selected as variable candidates. As found in the different ranges 
of the horizontal axis and the vertical axis, the size of variable candidates can be decreased 
significantly by combining the constraints of i?^* and group sizes together. 
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Fig. 7. — Example light curves of known or suspected variable objects which are not selected 
by our conservative variability detection. The SIMBAD names of the objects are given in 
the top of each panel. 
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Fig. 8. — Example light curves for known or suspected variable objects included in our 
conservative variable candidates. The SIMBAD names of the objects are given in the top of 
each panel. 
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Fig. 9. — Example light curves of variable candidates matched to SIMBAD objects of non- 
variable stars. Each light curve corresponds to the SIMBAD object with the name given 
in the top of each panel. Although IRC +00434 and 2MASS J19391065+0543500 are not 
found as kno wn variable sources in the SIMBAD databa se, these objects are found in the 
VSX catalog (jPojmanskil l2002l : lUsatov &: Nosulchikl |2008| ) as we explain in Appendix. 
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Fig. 10. — IRAS two-color diagram of variable candidates. The solid line represents the 
color of oxygen-rich Mira variables and variable OH/IR stars from Ivan der Veen &: Habing 
(Il988l ). Cross symbols correspond to objects shown in Figure [TTl 
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Fig. 11. — Example light curves of variable candidates that are IRAS sources. The IRAS 
designations and (C12/25, C'25/60) are presented in the top of each panel. Objects on the left 
column have colors near the curve presented in Figure [TOl 
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Fig. 12. — AKARI two-color diagram of variable candidates and other objects. Following the 
definition of the IRAS colors given in Equation ([5]), AKRAI colors are defined by using fiuxes 
at its IRC 9, 18 /im and FIS 65 fim. Filled circles represent new variable candidates selected 
with our conservative selection, while empty squares correspond to non-variable objects or 
objects that are known variable stars or galaxies in the SIMBAD database. 
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Fig. 13. — Example light curves of variable candidates (left) and non-variable objects (right) 
that are AKARI sources. The AKARI designations in the IRC catalog and (Cg/ig, Cig/es) 
are presented in the top of each panel. 
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Fig. 14. — 2MASS color-color diagram of variable candidates. The color distribution of 
variable cand idates (solid-line contours) is similar to the overall color distribution of or- 
dinar y stars (Covev et all 2007 ). while observed colors of 6658 QSOs with redshifts > 0.3 
from Veron-Cetty fc Veron " 12006 , dotted-line contours) are not similar to those of variable 
candidates. The colo r distribution of known pulsating variable stars in LMC and SMC 
(dashed-line contours; llta et al.ll2004l ) implies that variable candidates redder than {J — Kg) 
= 1.4 (dots) might be variable carbon-rich stars. Th e loci of classical T Tauri stars wit h 
de- reddened colors is presented as a thick solid line (IMeyer. Calvet. fc Hillenbrandl 119971 ). 
Cross symbols correspond to objects shown in Figure [151 The Galactic extinction is not 
considered here. 
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Fig. 15. — Example light curves of the NSVS objects with the reliable 2MASS photometry. 
The 2MASS designations and {J — H, H — Kg) are given in the top of each pa nel. 2MASS 



18552 297+0404353 is also PDS 551 which is a Herbig Ae/Be candidate star (IVieira et al. 



20031). 2MASS 09322353+1146033 corresponds to IRAS 09296+1159. 
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Fig. 16. — SDSS color-color diagrams of the variable candida tes. Boxes represen t the ranges 
of single-epoch colors for RR Lyrae variable candidates from lSesar et al.l ( 120101 ). Solid lines 
in the panel of {g — r) and (r — i) colors represent {g — i) colors corresponding to spectral 
types 05, AO, FO, GO, KO, MO, ari d M5 from left to right, which are derived from synthesized 
stellar spectra (ICovey et al.l 120071 ). The Galactic extinction is not included here. 
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Fig. 17. — Example light curves of the NSVS objects selected as RR Lyrae variable candidates 
with the SDSS spectroscopic data. The left column shows the raw NSVS hght curves, while 
the right column presents the light curves folded with approximate periods of 0.541757 (top) 
and 0.489448 (bottom) days, respectively. 
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Fig. 18. — Example light curves of variable candidates with the reliable SDSS and 2MASS 
photometric measurements. Top two objects are included in all color-color diagrams shown 
in Figure [IB SDSS J155325. 80+530924. 2 is not classified as a known variable star in the 
SIMBAD database. However, it is a sp ectroscopically c onfirmed giant star with another 
designation 2MASS J1 5532581+5309244 JCruz et al.ll2003[ ). and is found as a variable star in 



Wozniak et al.l (j2004b|). SD SS J021532.23-104029 .3 is identified as a field horizontal branch 



star BPS CS 22175-0003 bv IWilhelm et al.l (jl999h . 
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Fig. 19. — Color-color diagrams of variable candidates with the SDSS and GALEX pho- 
tometric data. The plots show variable candidates matching the SDSS objects within 1". 
Contours corr espond to the color d istributions of quasars which are detected in both SDSS 
and GALEX JXrammell et al.lboOTh . 
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Fig. 20. — Example light curves of variable candidates with the reliable GALEX and SDSS 
photometry. Three objects (left) have further identification in SIMBAD, while other three 
objects (right) are selected in the order of increasing (NUV — g) color from top to bottom. 



Table 1. Variability indices. 



Index Definition 



a 



71 



K 



^N{N - 1) E^^i(^'n - /.)VjV 



(^-V-3) {(^ + ^)((S^f^ - 2) + 6} 

{1 if (xn — /Uq) > 2cr and (xn+i — /io) > 2cr and (xn+2 — /^o) > 2cr 
1 if (xn - /io) < -2cr and (xn+i - /io) < -2a and (xn+2 - /io) < -2cr 
otherwise 

E^="i'(^r.+i - x„)V(jv - 1) 

N-l 



=1 sign((5„,5„+i)A/|5„5, 



n+l I 



AoVM The maximum value of the analysis of variance (ANOVA) statistic fSchwarzenberg-Czernv 19961 



Note. — cr, /i, 7i, 72, and /io are standard deviation, average, skewness, kurtosis, and 
median of N magnitudes x„ in each light curve, respectively. 5„ is ^JN/{N — l)(x„ — /i)/e„ 
where e„ is a photometric error for each data point. sign(5„5„+i) is the sign of 



